North Macedonia
Optimal Aggregation of Prediction Intervals under Unsupervised Domain Shift
As machine learning models are increasingly deployed in dynamic environments, it becomes paramount to assess and quantify uncertainties associated with distribution shifts. A distribution shift occurs when the underlying data-generating process changes, leading to a deviation in the model's performance. The prediction interval, which captures the range of likely outcomes for a given prediction, serves as a crucial tool for characterizing uncertainties induced by their underlying distribution. In this paper, we propose methodologies for aggregating prediction intervals to obtain one with minimal width and adequate coverage on the target domain under unsupervised domain shift, under which we have labeled samples from a related source domain and unlabeled covariates from the target domain. Our analysis encompasses scenarios where the source and the target domain are related via i) a bounded density ratio, and ii) a measure-preserving transformation. Our proposed methodologies are computationally efficient and easy to implement.
Irony Detection, Reasoning and Understanding in Zero-shot Learning
Irony is a powerful figurative language (FL) on social media that can potentially mislead various NLP tasks, such as recommendation systems, misinformation checks, and sentiment analysis. Understanding the implicit meaning of this kind of subtle language is essential to mitigate irony's negative impact on NLP tasks. However, building models to understand irony presents a unique set of challenges, because irony is a complex form of language that often relies on context, tone, and subtle cues to convey meaning that is opposite or different from the literal interpretation. Large language models, such as ChatGPT, are increasingly able to capture implicit and contextual information. In this study, we investigate the generalization, reasoning and understanding ability of ChatGPT on irony detection across six different genre irony detection datasets. Our findings suggest that ChatGPT appears to show an enhanced language understanding and reasoning ability. But it needs to be very careful in prompt engineering design. Thus, we propose a prompt engineering design framework IDADP to achieve higher irony detection accuracy, improved understanding of irony, and more effective explanations compared to other state-of-the-art ChatGPT zero-shot approaches. And ascertain via experiments that the practice generated under the framework is likely to be the promised solution to resolve the generalization issues of LLMs.
Examining the Role of Relationship Alignment in Large Language Models
Altenburger, Kristen M., Jiang, Hongda, Kraut, Robert E., Wang, Yi-Chia, Dwivedi-Yu, Jane
The rapid development and deployment of Generative AI in social settings raise important questions about how to optimally personalize them for users while maintaining accuracy and realism. Based on a Facebook public post-comment dataset, this study evaluates the ability of Llama 3.0 (70B) to predict the semantic tones across different combinations of a commenter's and poster's gender, age, and friendship closeness and to replicate these differences in LLM-generated comments. The study consists of two parts: Part I assesses differences in semantic tones across social relationship categories, and Part II examines the similarity between comments generated by Llama 3.0 (70B) and human comments from Part I given public Facebook posts as input. Part I results show that including social relationship information improves the ability of a model to predict the semantic tone of human comments. However, Part II results show that even without including social context information in the prompt, LLM-generated comments and human comments are equally sensitive to social context, suggesting that LLMs can comprehend semantics from the original post alone. When we include all social relationship information in the prompt, the similarity between human comments and LLM-generated comments decreases. This inconsistency may occur because LLMs did not include social context information as part of their training data. Together these results demonstrate the ability of LLMs to comprehend semantics from the original post and respond similarly to human comments, but also highlights their limitations in generalizing personalized comments through prompting alone.
Dataset Dictionary Learning in a Wasserstein Space for Federated Domain Adaptation
Montesuma, Eduardo Fernandes, Castellon, Fabiola Espinoza, Mboula, Fred Ngolรจ, Mayoue, Aurรฉlien, Souloumiac, Antoine, Gouy-Pailler, Cรฉdric
Multi-Source Domain Adaptation (MSDA) is a challenging scenario where multiple related and heterogeneous source datasets must be adapted to an unlabeled target dataset. Conventional MSDA methods often overlook that data holders may have privacy concerns, hindering direct data sharing. In response, decentralized MSDA has emerged as a promising strategy to achieve adaptation without centralizing clients' data. Our work proposes a novel approach, Decentralized Dataset Dictionary Learning, to address this challenge. Our method leverages Wasserstein barycenters to model the distributional shift across multiple clients, enabling effective adaptation while preserving data privacy. Specifically, our algorithm expresses each client's underlying distribution as a Wasserstein barycenter of public atoms, weighted by private barycentric coordinates. Our approach ensures that the barycentric coordinates remain undisclosed throughout the adaptation process. Extensive experimentation across five visual domain adaptation benchmarks demonstrates the superiority of our strategy over existing decentralized MSDA techniques. Moreover, our method exhibits enhanced robustness to client parallelism while maintaining relative resilience compared to conventional decentralized MSDA methodologies.
I'm a professional gamer and people pay me thousands to finish games for them
If you grew up obsessed with gaming, them you were probably told by various relatives that you could never make a living playing video games all day. Yet while that might once have been true, there is now a growing industry of professional gamers for hire making serious money with their hard-earned skills. Marko Uslinkovski is a 36-year-old professional gamer from North Macedonia who makes a living beating games for people who don't have time to do it themselves. With a team of 50 'boosters' Marko told MailOnline his company, Captain Carry, can turnover between 30,000 to 50,000 in a good month. Marko told MailOnline: 'These new games are extremely difficult, so we're like the last ditch effort for people that are borderline giving up.' Marko Uslinkovski (pictured) is a 36-year-old professional gamer from North Macedonia who makes a living beating games for people who don't have time to do it themselves If you grew up obsessed with gaming, then you were probably told by various relatives that you'd never make a living playing video games all day (stock image) Like so many who end up with a life-long passion for video games, Marko was hooked from his very first taste.
rKAN: Rational Kolmogorov-Arnold Networks
The development of Kolmogorov-Arnold networks (KANs) marks a significant shift from traditional multi-layer perceptrons in deep learning. Initially, KANs employed B-spline curves as their primary basis function, but their inherent complexity posed implementation challenges. Consequently, researchers have explored alternative basis functions such as Wavelets, Polynomials, and Fractional functions. In this research, we explore the use of rational functions as a novel basis function for KANs. We propose two different approaches based on Pade approximation and rational Jacobi functions as trainable basis functions, establishing the rational KAN (rKAN). We then evaluate rKAN's performance in various deep learning and physics-informed tasks to demonstrate its practicality and effectiveness in function approximation.
Optimal Aggregation of Prediction Intervals under Unsupervised Domain Shift
Ge, Jiawei, Mukherjee, Debarghya, Fan, Jianqing
As machine learning models are increasingly deployed in dynamic environments, it becomes paramount to assess and quantify uncertainties associated with distribution shifts. A distribution shift occurs when the underlying data-generating process changes, leading to a deviation in the model's performance. The prediction interval, which captures the range of likely outcomes for a given prediction, serves as a crucial tool for characterizing uncertainties induced by their underlying distribution. In this paper, we propose methodologies for aggregating prediction intervals to obtain one with minimal width and adequate coverage on the target domain under unsupervised domain shift, under which we have labeled samples from a related source domain and unlabeled covariates from the target domain. Our analysis encompasses scenarios where the source and the target domain are related via i) a bounded density ratio, and ii) a measure-preserving transformation. Our proposed methodologies are computationally efficient and easy to implement. Beyond illustrating the performance of our method through a real-world dataset, we also delve into the theoretical details. This includes establishing rigorous theoretical guarantees, coupled with finite sample bounds, regarding the coverage and width of our prediction intervals. Our approach excels in practical applications and is underpinned by a solid theoretical framework, ensuring its reliability and effectiveness across diverse contexts.
Combined Compromise for Ideal Solution (CoCoFISo): a multi-criteria decision-making based on the CoCoSo method algorithm
Rasoanaivo, Rรดlin Gabriel, Yazdani, Morteza, Zaratรฉ, Pascale, Fateh, Amirhossein
Each decision-making tool should be tested and validated in real case studies to be practical and fit to global problems. The application of multi-criteria decision-making methods (MCDM) is currently a trend to rank alternatives. In the literature, there are several multi-criteria decision-making methods according to their classification. During our experimentation on the Combined Compromise Solution (CoCoSo) method, we encountered its limits for real cases. The authors examined the applicability of the CoCoFISo method (improved version of combined compromise solution), by a real case study in a university campus and compared the obtained results to other MCDMs such as Preference Ranking Organisation Method for Enrichment Evaluations (PROMETHEE), Weighted Sum Method (WSM) and Technique for Order Preference by Similarity to the Ideal Solution (TOPSIS). Our research finding indicates that CoCoSo is an applied method that has been developed to solve complex multi variable assessment problems, while CoCoFISo can improve the shortages observed in CoCoSo and deliver stable outcomes compared to other developed tools. The findings imply that application of CoCoFISo is suggested to decision makers, experts and researchers while they are facing practical challenges and sensitive questions regarding the utilization of a reliable decision-making method. Unlike many prior studies, the current version of CoCoSo is unique, original and is presented for the first time. Its performance was approved using several strategies and examinations.
AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents
Gioacchini, Luca, Siracusano, Giuseppe, Sanvito, Davide, Gashteovski, Kiril, Friede, David, Bifulco, Roberto, Lawrence, Carolin
The advances made by Large Language Models (LLMs) have led to the pursuit of LLM agents that can solve intricate, multi-step reasoning tasks. As with any research pursuit, benchmarking and evaluation are key corner stones to efficient and reliable progress. However, existing benchmarks are often narrow and simply compute overall task success. To face these issues, we propose AgentQuest -- a framework where (i) both benchmarks and metrics are modular and easily extensible through well documented and easy-to-use APIs; (ii) we offer two new evaluation metrics that can reliably track LLM agent progress while solving a task. We exemplify the utility of the metrics on two use cases wherein we identify common failure points and refine the agent architecture to obtain a significant performance increase. Together with the research community, we hope to extend AgentQuest further and therefore we make it available under https://github.com/nec-research/agentquest.
Accurately Predicting Probabilities of Safety-Critical Rare Events for Intelligent Systems
Bai, Ruoxuan, Yang, Jingxuan, Gong, Weiduo, Zhang, Yi, Lu, Qiujing, Feng, Shuo
Intelligent systems are increasingly integral to our daily lives, yet rare safety-critical events present significant latent threats to their practical deployment. Addressing this challenge hinges on accurately predicting the probability of safety-critical events occurring within a given time step from the current state, a metric we define as 'criticality'. The complexity of predicting criticality arises from the extreme data imbalance caused by rare events in high dimensional variables associated with the rare events, a challenge we refer to as the curse of rarity. Existing methods tend to be either overly conservative or prone to overlooking safety-critical events, thus struggling to achieve both high precision and recall rates, which severely limits their applicability. This study endeavors to develop a criticality prediction model that excels in both precision and recall rates for evaluating the criticality of safety-critical autonomous systems. We propose a multi-stage learning framework designed to progressively densify the dataset, mitigating the curse of rarity across stages. To validate our approach, we evaluate it in two cases: lunar lander and bipedal walker scenarios. The results demonstrate that our method surpasses traditional approaches, providing a more accurate and dependable assessment of criticality in intelligent systems.